import dalex as dx
import pandas as pd
import numpy as np
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
import warnings
warnings.filterwarnings('ignore')
First, divide the data into variables X and a target variable y.
data = dx.datasets.load_titanic()
X = data.drop(columns='survived')
y = data.survived
data.head(10)
numerical_transformer pipeline:
numerical_features: choose numerical features to transformcategorical_transformer pipeline:
categorical_features: choose categorical features to transform 'missing' stringaggregate those two pipelines into a preprocessor using ColumnTransformer
classifier model using MLPClassifier - it has 3 hidden layers with sizes 150, 100, 50 respectivelyclf pipeline model, which combines the preprocessor with the basic classifier model numeric_features = ['age', 'fare', 'sibsp', 'parch']
numeric_transformer = Pipeline(
steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())
]
)
categorical_features = ['gender', 'class', 'embarked']
categorical_transformer = Pipeline(
steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(handle_unknown='ignore'))
]
)
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features)
]
)
classifier = MLPClassifier(hidden_layer_sizes=(150,100,50), max_iter=500, random_state=0)
clf = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', classifier)])
clf.fit(X, y)
exp = dx.Explainer(clf, X, y, label = "Titanic MLP Pipeline")
Above functions are accessible from the Explainer object through its methods.
Each of them returns a new unique object that contains a result field in the form of a pandas.DataFrame and a plot method.
This function is nothing but normal model prediction, however it uses Explainer interface.
Let's create two example persons for this tutorial.
john = pd.DataFrame({'gender': ['male'],
'age': [25],
'class': ['1st'],
'embarked': ['Southampton'],
'fare': [72],
'sibsp': [0],
'parch': 0},
index = ['John'])
mary = pd.DataFrame({'gender': ['female'],
'age': [35],
'class': ['3st'],
'embarked': ['Cherbourg'],
'fare': [25],
'sibsp': [0],
'parch': [0]},
index = ['Mary'])
You can make a prediction on many samples at the same time
exp.predict(X)[0:10]
As well as on only one instance. However, the only accepted format is pandas.DataFrame.
Prediction of survival for John.
exp.predict(john)
Prediction of survival for Mary.
exp.predict(mary)
'break_down'
'break_down_interactions'
'shap'
This function calculates Variable Attributions as Break Down, iBreakDown or Shapley Values explanations.
Model prediction is decomposed into parts that are attributed for particular variables.
bd_john = exp.predict_parts(john, type='break_down')
bd_interactions_john = exp.predict_parts(john, type='break_down_interactions')
sh_mary = exp.predict_parts(mary, type='shap', B = 10)
bd_john.result.label = "John"
bd_interactions_john.result.label = "John+"
bd_john.result
bd_john.plot(bd_interactions_john)
sh_mary.result.label = "Mary"
sh_mary.result.loc[sh_mary.result['B'] == 0, ]
sh_mary.plot(bar_width = 20)
'ceteris_paribus'This function computes individual profiles aka Ceteris Paribus Profiles.
cp_mary = exp.predict_profile(mary)
cp_john = exp.predict_profile(john)
cp_mary.result.head()
cp_mary.plot(cp_john)
cp_john.plot(variable_type = "categorical")
'classification'
'regression'
This function calculates various Model Performance measures:
mp = exp.model_performance(model_type = 'classification')
mp.result
mp.result.auc[0]
mp.plot()
'variable_importance'
'ratio'
'difference'
This function calculates Variable Importance.
vi = exp.model_parts()
vi.result
vi.plot(max_vars=5)
'partial'
'accumulated'
This function calculates explanations that explore model response as a function of selected variables.
The explanations can be calulated as Partial Dependence Profile or Accumulated Local Dependence Profile.
pdp_num = exp.model_profile(type = 'partial')
pdp_num.result["_label_"] = 'pdp'
ale_num = exp.model_profile(type = 'accumulated')
ale_num.result["_label_"] = 'ale'
pdp_num.plot(ale_num)
pdp_cat = exp.model_profile(type = 'partial', variable_type='categorical', variables = ["gender","class"])
pdp_cat.result['_label_'] = 'pdp'
ale_cat = exp.model_profile(type = 'accumulated', variable_type='categorical', variables = ["gender","class"])
ale_cat.result['_label_'] = 'ale'
ale_cat.plot(pdp_cat)
Theoretical introduction to the plots: Explanatory Model Analysis. Explore, Explain and Examine Predictive Models.
dalex GitHub
XAI tools and more ModelOriented